224 research outputs found
Zero-Shot Certified Defense against Adversarial Patches with Vision Transformers
Adversarial patch attack aims to fool a machine learning model by arbitrarily
modifying pixels within a restricted region of an input image. Such attacks are
a major threat to models deployed in the physical world, as they can be easily
realized by presenting a customized object in the camera view. Defending
against such attacks is challenging due to the arbitrariness of patches, and
existing provable defenses suffer from poor certified accuracy. In this paper,
we propose PatchVeto, a zero-shot certified defense against adversarial patches
based on Vision Transformer (ViT) models. Rather than training a robust model
to resist adversarial patches which may inevitably sacrifice accuracy,
PatchVeto reuses a pretrained ViT model without any additional training, which
can achieve high accuracy on clean inputs while detecting adversarial patched
inputs by simply manipulating the attention map of ViT. Specifically, each
input is tested by voting over multiple inferences with different attention
masks, where at least one inference is guaranteed to exclude the adversarial
patch. The prediction is certifiably robust if all masked inferences reach
consensus, which ensures that any adversarial patch would be detected with no
false negative. Extensive experiments have shown that PatchVeto is able to
achieve high certified accuracy (e.g. 67.1% on ImageNet for 2%-pixel
adversarial patches), significantly outperforming state-of-the-art methods. The
clean accuracy is the same as vanilla ViT models (81.8% on ImageNet) since the
model parameters are directly reused. Meanwhile, our method can flexibly handle
different adversarial patch sizes by simply changing the masking strategy.Comment: 12 pages, 5 figure
Microbial Communities in Water during Red Tides along the Coast of China-A Case Study of Prorocentrum Donghaiense Red Tide in the East China Sea
Red tides are a major public hazard in the global oceans. The coast of the East China Sea is the sea area where red tide disasters are the most frequent and serious in China. In order to accurately grasp the occurrence of red tides in the coastal waters of the East China Sea, and to understand the microbial communities in the waters during the occurrence of red tides in the East China Sea, a special survey of red tides in the coastal waters of Zhejiang, China was carried out in June 2018. The results showed that nutrient concentrations of N and P were generally high in this area, DIN concentrations in most areas exceeded the permitted limit of Chinese seawater quality grade I. There were significant differences in dissolved oxygen, pH, COD, chlorophyll and phytoplankton abundance of red tides. During the investigation, red tides were found in the waters near the Yushan Islands. The content of chlorophyll a was 42.12mg/m3, the cell abundance of phytoplankton was 8.16×108/L, and the abundance of Prorocentrum edulis accounted for 98.5%. The Illumina MiSeq sequencing platform was used for 16s high-throughput sequencing of water microorganisms, and a total of 16 bacteria were identified. Proteobacteria is the first dominant phylum, followed by Cyanobacteria and Bacteroides. Some differences in bacterial community compositions between HAB and the nearby seawater were observed. The predominant bacteria in the red tide occurrence area were Proteobacteria, comprising 46.1% of the relative abundance; while the predominant bacteria in the nearby sea area, comprising 42.0% of the relative abundance
TSTTC: A Large-Scale Dataset for Time-to-Contact Estimation in Driving Scenarios
Time-to-Contact (TTC) estimation is a critical task for assessing collision
risk and is widely used in various driver assistance and autonomous driving
systems. The past few decades have witnessed development of related theories
and algorithms. The prevalent learning-based methods call for a large-scale TTC
dataset in real-world scenarios. In this work, we present a large-scale object
oriented TTC dataset in the driving scene for promoting the TTC estimation by a
monocular camera. To collect valuable samples and make data with different TTC
values relatively balanced, we go through thousands of hours of driving data
and select over 200K sequences with a preset data distribution. To augment the
quantity of small TTC cases, we also generate clips using the latest Neural
rendering methods. Additionally, we provide several simple yet effective TTC
estimation baselines and evaluate them extensively on the proposed dataset to
demonstrate their effectiveness. The proposed dataset is publicly available at
https://open-dataset.tusen.ai/TSTTC.Comment: 19 pages, 9 figure
Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models
The recent performance leap of Large Language Models (LLMs) opens up new
opportunities across numerous industrial applications and domains. However,
erroneous generations, such as false predictions, misinformation, and
hallucination made by LLMs, have also raised severe concerns for the
trustworthiness of LLMs', especially in safety-, security- and
reliability-sensitive scenarios, potentially hindering real-world adoptions.
While uncertainty estimation has shown its potential for interpreting the
prediction risks made by general machine learning (ML) models, little is known
about whether and to what extent it can help explore an LLM's capabilities and
counteract its undesired behavior. To bridge the gap, in this paper, we
initiate an exploratory study on the risk assessment of LLMs from the lens of
uncertainty. In particular, we experiment with twelve uncertainty estimation
methods and four LLMs on four prominent natural language processing (NLP) tasks
to investigate to what extent uncertainty estimation techniques could help
characterize the prediction risks of LLMs. Our findings validate the
effectiveness of uncertainty estimation for revealing LLMs'
uncertain/non-factual predictions. In addition to general NLP tasks, we
extensively conduct experiments with four LLMs for code generation on two
datasets. We find that uncertainty estimation can potentially uncover buggy
programs generated by LLMs. Insights from our study shed light on future design
and development for reliable LLMs, facilitating further research toward
enhancing the trustworthiness of LLMs.Comment: 20 pages, 4 figure
Question Decomposition Tree for Answering Complex Questions over Knowledge Bases
Knowledge base question answering (KBQA) has attracted a lot of interest in
recent years, especially for complex questions which require multiple facts to
answer. Question decomposition is a promising way to answer complex questions.
Existing decomposition methods split the question into sub-questions according
to a single compositionality type, which is not sufficient for questions
involving multiple compositionality types. In this paper, we propose Question
Decomposition Tree (QDT) to represent the structure of complex questions.
Inspired by recent advances in natural language generation (NLG), we present a
two-staged method called Clue-Decipher to generate QDT. It can leverage the
strong ability of NLG model and simultaneously preserve the original questions.
To verify that QDT can enhance KBQA task, we design a decomposition-based KBQA
system called QDTQA. Extensive experiments show that QDTQA outperforms previous
state-of-the-art methods on ComplexWebQuestions dataset. Besides, our
decomposition method improves an existing KBQA system by 12% and sets a new
state-of-the-art on LC-QuAD 1.0.Comment: Accepted by AAAI202
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
Recently, large language models (LLMs) have made significant advancements in
natural language understanding and generation. However, their potential in
computer vision remains largely unexplored. In this paper, we introduce a new,
exploratory approach that enables LLMs to process images using the Scalable
Vector Graphics (SVG) format. By leveraging the XML-based textual descriptions
of SVG representations instead of raster images, we aim to bridge the gap
between the visual and textual modalities, allowing LLMs to directly understand
and manipulate images without the need for parameterized visual components. Our
method facilitates simple image classification, generation, and in-context
learning using only LLM capabilities. We demonstrate the promise of our
approach across discriminative and generative tasks, highlighting its (i)
robustness against distribution shift, (ii) substantial improvements achieved
by tapping into the in-context learning abilities of LLMs, and (iii) image
understanding and generation capabilities with human guidance. Our code, data,
and models can be found here https://github.com/mu-cai/svg-llm
Molecular state interpretation of charmed baryons in the quark model
Stimulated by the observation of by the Belle
Collaboration, the -wave pentaquark systems
with = 0, = are
investigated in the framework of quark delocalization color screening
model(QDCSM). The real-scaling method is utilized to check the bound states and
the genuine resonance states. The root mean square of cluster spacing is also
calculated to study the structure of the states and estimate if the state is
resonance state or not. The numerical results show that
cannot be interpreted as a molecular state, and cannot be
explained as the molecular state with . can
be interpreted as the molecular state with and the main
component is . can be interpreted as the
molecular state with and the main component is
. is likely to be interpreted as a
molecular state with , and the main component is . Besides,
two new molecular states are predicted, one is the
resonance state with the mass around 3140 MeV, another one is the
with the mass of 3188.3 MeV.Comment: 12 pages, 3 figure
LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Over the past decade, Artificial Intelligence (AI) has had great success
recently and is being used in a wide range of academic and industrial fields.
More recently, LLMs have made rapid advancements that have propelled AI to a
new level, enabling even more diverse applications and industrial domains with
intelligence, particularly in areas like software engineering and natural
language processing. Nevertheless, a number of emerging trustworthiness
concerns and issues exhibited in LLMs have already recently received much
attention, without properly solving which the widespread adoption of LLMs could
be greatly hindered in practice. The distinctive characteristics of LLMs, such
as the self-attention mechanism, extremely large model scale, and
autoregressive generation schema, differ from classic AI software based on CNNs
and RNNs and present new challenges for quality analysis. Up to the present, it
still lacks universal and systematic analysis techniques for LLMs despite the
urgent industrial demand. Towards bridging this gap, we initiate an early
exploratory study and propose a universal analysis framework for LLMs, LUNA,
designed to be general and extensible, to enable versatile analysis of LLMs
from multiple quality perspectives in a human-interpretable manner. In
particular, we first leverage the data from desired trustworthiness
perspectives to construct an abstract model as an auxiliary analysis asset,
which is empowered by various abstract model construction methods. To assess
the quality of the abstract model, we collect and define a number of evaluation
metrics, aiming at both abstract model level and the semantics level. Then, the
semantics, which is the degree of satisfaction of the LLM w.r.t. the
trustworthiness perspective, is bound to and enriches the abstract model with
semantics, which enables more detailed analysis applications for diverse
purposes.Comment: 44 pages, 9 figure
Bridging the Gap between Chemical Reaction Pretraining and Conditional Molecule Generation with a Unified Model
Chemical reactions are the fundamental building blocks of drug design and
organic chemistry research. In recent years, there has been a growing need for
a large-scale deep-learning framework that can efficiently capture the basic
rules of chemical reactions. In this paper, we have proposed a unified
framework that addresses both the reaction representation learning and molecule
generation tasks, which allows for a more holistic approach. Inspired by the
organic chemistry mechanism, we develop a novel pretraining framework that
enables us to incorporate inductive biases into the model. Our framework
achieves state-of-the-art results on challenging downstream tasks. By
possessing chemical knowledge, our generative framework overcome the
limitations of current molecule generation models that rely on a small number
of reaction templates. In the extensive experiments, our model generates
synthesizable drug-like structures of high quality. Overall, our work presents
a significant step toward a large-scale deep-learning framework for a variety
of reaction-based applications
- …